Symbolic Chemical Representations

The Wolfram Language provides several data structures for representing chemical species at different levels of granularity.

Start with a small

BioSequence

representing a single codon.

In[42]:=

bioseq=BioSequence["RNA","AUG"]

Out[42]=

BioSequence

Type: RNA Sequence

Content: AUG

(3 letters)



From this bio sequence you can create a

ChemicalFormula

or a

Molecule

depending on your application.

In[54]:=

form=ChemicalFormula@bioseq

Out[54]=

In[53]:=

mol=Molecule@bioseq

Out[53]=

Molecule

Formula:

Atoms:

Bonds:

105



Equivalence between different representations can be checked easily using

MoleculeMatchQ

In[57]:=

{MoleculeMatchQ[mol,bioseq],MoleculeMatchQ[mol,form]}

Out[57]=

{True,True}

As the simplest representation, the formula allows you to find molecular mass and elemental composition.

In[62]:=

form[{"MolecularMass","ElementCounts"}]

Out[62]=



918.6

,

carbon

29,

hydrogen

36,

nitrogen

12,

oxygen

19,

phosphorus

2

The molecule represents all atoms and bonds explicitly and allows computing topological properties or even generating a 3D structure.

In[63]:=

mol[{"AromaticRingCount","HBondDonorCount"}]

Out[63]=

{5,11}

In[64]:=

MoleculePlot3D[mol,PlotTheme->"Spacefilling"]

Out[64]=

The bio sequence representation allows computation at a higher level of abstraction. Convert this sequence into DNA or into a peptide.

In[66]:=

BioSequenceTranscribe[bioseq]

Out[66]=

BioSequence

Type: DNA Sequence

Content: ATG

(3 letters)



In[65]:=

BioSequenceTranslate[bioseq]

Out[65]=

BioSequence

Type: Peptide Sequence

Content: M

(1 letter)

